27 research outputs found

    Learning to Reconstruct Shapes from Unseen Classes

    Full text link
    From a single image, humans are able to perceive the full 3D shape of an object by exploiting learned shape priors from everyday life. Contemporary single-image 3D reconstruction algorithms aim to solve this task in a similar fashion, but often end up with priors that are highly biased by training classes. Here we present an algorithm, Generalizable Reconstruction (GenRe), designed to capture more generic, class-agnostic shape priors. We achieve this with an inference network and training procedure that combine 2.5D representations of visible surfaces (depth and silhouette), spherical shape representations of both visible and non-visible surfaces, and 3D voxel-based representations, in a principled manner that exploits the causal structure of how 3D shapes give rise to 2D images. Experiments demonstrate that GenRe performs well on single-view shape reconstruction, and generalizes to diverse novel objects from categories not seen during training.Comment: NeurIPS 2018 (Oral). The first two authors contributed equally to this paper. Project page: http://genre.csail.mit.edu

    Visual Object Networks: Image Generation with Disentangled 3D Representation

    Full text link
    Recent progress in deep generative models has led to tremendous breakthroughs in image generation. However, while existing models can synthesize photorealistic images, they lack an understanding of our underlying 3D world. We present a new generative model, Visual Object Networks (VON), synthesizing natural images of objects with a disentangled 3D representation. Inspired by classic graphics rendering pipelines, we unravel our image formation process into three conditionally independent factors---shape, viewpoint, and texture---and present an end-to-end adversarial learning framework that jointly models 3D shapes and 2D images. Our model first learns to synthesize 3D shapes that are indistinguishable from real shapes. It then renders the object's 2.5D sketches (i.e., silhouette and depth map) from its shape under a sampled viewpoint. Finally, it learns to add realistic texture to these 2.5D sketches to generate natural images. The VON not only generates images that are more realistic than state-of-the-art 2D image synthesis methods, but also enables many 3D operations such as changing the viewpoint of a generated image, editing of shape and texture, linear interpolation in texture and shape space, and transferring appearance across different objects and viewpoints.Comment: NeurIPS 2018. Code: https://github.com/junyanz/VON Website: http://von.csail.mit.edu

    Pix3D: Dataset and Methods for Single-Image 3D Shape Modeling

    Full text link
    We study 3D shape modeling from a single image and make contributions to it in three aspects. First, we present Pix3D, a large-scale benchmark of diverse image-shape pairs with pixel-level 2D-3D alignment. Pix3D has wide applications in shape-related tasks including reconstruction, retrieval, viewpoint estimation, etc. Building such a large-scale dataset, however, is highly challenging; existing datasets either contain only synthetic data, or lack precise alignment between 2D images and 3D shapes, or only have a small number of images. Second, we calibrate the evaluation criteria for 3D shape reconstruction through behavioral studies, and use them to objectively and systematically benchmark cutting-edge reconstruction algorithms on Pix3D. Third, we design a novel model that simultaneously performs 3D reconstruction and pose estimation; our multi-task learning approach achieves state-of-the-art performance on both tasks.Comment: CVPR 2018. The first two authors contributed equally to this work. Project page: http://pix3d.csail.mit.ed

    End-to-End Optimization of Scene Layout

    Full text link
    We propose an end-to-end variational generative model for scene layout synthesis conditioned on scene graphs. Unlike unconditional scene layout generation, we use scene graphs as an abstract but general representation to guide the synthesis of diverse scene layouts that satisfy relationships included in the scene graph. This gives rise to more flexible control over the synthesis process, allowing various forms of inputs such as scene layouts extracted from sentences or inferred from a single color image. Using our conditional layout synthesizer, we can generate various layouts that share the same structure of the input example. In addition to this conditional generation design, we also integrate a differentiable rendering module that enables layout refinement using only 2D projections of the scene. Given a depth and a semantics map, the differentiable rendering module enables optimizing over the synthesized layout to fit the given input in an analysis-by-synthesis fashion. Experiments suggest that our model achieves higher accuracy and diversity in conditional scene synthesis and allows exemplar-based scene generation from various input forms.Comment: CVPR 2020 (Oral). Project page: http://3dsln.csail.mit.edu

    Pix3D: Dataset and Methods for Single-Image 3D Shape Modeling

    Full text link
    We study 3D shape modeling from a single image and make contributions to it in three aspects. First, we present Pix3D, a large-scale benchmark of diverse image-shape pairs with pixel-level 2D-3D alignment. Pix3D has wide applications in shape-related tasks including reconstruction, retrieval, viewpoint estimation, etc. Building such a large-scale dataset, however, is highly challenging; existing datasets either contain only synthetic data, or lack precise alignment between 2D images and 3D shapes, or only have a small number of images. Second, we calibrate the evaluation criteria for 3D shape reconstruction through behavioral studies, and use them to objectively and systematically benchmark cutting-edge reconstruction algorithms on Pix3D. Third, we design a novel model that simultaneously performs 3D reconstruction and pose estimation; our multi-task learning approach achieves state-of-the-art performance on both tasks.Comment: CVPR 2018. The first two authors contributed equally to this work. Project page: http://pix3d.csail.mit.ed

    Inferring Shape and Material from Sound

    No full text
    Humans infer rich knowledge of objects from both auditory and visual cues. Building a machine of such competency, however, is very challenging. One possible solution is to rely on supervised learning, which requires a large-scale dataset containing sounds of various objects, with clean labels on their appearances, shape and material. However, it is difficult and expensive to capture such a dataset. Another approach is to tackle the problem in an analysis-by-synthesis framework, where we iterative update current estimates given a generative model. This, however, requires sophisticated generative models, which is too computationally expensive to support iterative inference. Finally, despite the popularity of deep learning methods in auditory perception tasks, most of them are derived from visual recognition tasks, which may not be suitable for processing audios. To address such difficulties, we first present a novel, open-source pipeline that generates audio-visual data, purely from 3D object shapes and their physical properties. Using this generative model, we are able to construct a synthetic audio-visual dataset, namely Sound-20K, for object perception tasks. We further demonstrate that the representation learned on synthetic audio-visual data can transfer to real-world scenarios. In addition, the generative model can be made efficient enough to support iterative inference, where we construct an analysis-by-synthesis framework that infers object’s shape and material by hearing it falling on the ground.S.M

    Dataset on the mechanical property of graphite after molten FLiNaK salt infiltration

    No full text
    Presented in this article are mechanical property and microstructural data for fluoride molten salt infiltrated graphite at high temperature. Four infiltration pressures (0 kPa, 450 kPa, 600 kPa, and 1000 kPa) and two kinds of graphite (IG-110 and NG-CT-10) were used during molten salt infiltration. After fluoride molten salt infiltration, compression testing and tension testing were performed at 700 °C to determine compressive strength, tensile strength, softening coefficient, stress–strain curve, and absorbed energy. Utilizing scanning electron microscopy (SEM) applied to fracture fragments, SEM micrographs for the fracture surface of molten salt infiltrated graphite and virgin graphite were determined. Keywords: Graphite, Fluoride molten salt, Infiltration, Mechanical strengt

    Peroxygenase-Catalyzed Selective Synthesis of Calcitriol Starting from Alfacalcidol

    No full text
    Calcitriol is an active analog of vitamin D3 and has excellent physiological activities in regulating healthy immune function. To synthesize the calcitriol compound, the concept of total synthesis is often adopted, which typically involves multiple steps and results in an overall low yield. Herein, we envisioned an enzymatic approach for the synthesis of calcitriol. Peroxygenase from Agrocybe aegerita (AaeUPO) was used as a catalyst to hydroxylate the C-H bond at the C-25 position of alfacalcidol and yielded the calcitriol in a single step. The enzymatic reaction yielded 80.3% product formation in excellent selectivity, with a turnover number up to 4000. In a semi-preparative scale synthesis, 72% isolated yield was obtained. It was also found that AaeUPO is capable of hydroxylating the C-H bond at the C-1 position of vitamin D3, thereby enabling the calcitriol synthesis directly from vitamin D3

    Consistent depth of moving objects in video

    No full text
    We present a method to estimate depth of a dynamic scene, containing arbitrary moving objects, from an ordinary video captured with a moving camera. We seek a geometrically and temporally consistent solution to this under-constrained problem: the depth predictions of corresponding points across frames should induce plausible, smooth motion in 3D. We formulate this objective in a new test-time training framework where a depth-prediction CNN is trained in tandem with an auxiliary scene-flow prediction MLP over the entire input video. By recursively unrolling the scene-flow prediction MLP over varying time steps, we compute both short-range scene flow to impose local smooth motion priors directly in 3D, and long-range scene flow to impose multi-view consistency constraints with wide baselines. We demonstrate accurate and temporally coherent results on a variety of challenging videos containing diverse moving objects (pets, people, cars), as well as camera motion. Our depth maps give rise to a number of depth-and-motion aware video editing effects such as object and lighting insertion
    corecore